Dynamic variance adaptation using differenced maximum mutual information
نویسندگان
چکیده
A conventional approach for noise robust automatic speech recognition consists of using a speech enhancement before recognition. However, speech enhancement cannot completely remove noise, thus a mismatch between the enhanced speech and the acoustic model inevitably remains. Uncertainty decoding approaches have been used to mitigate such a mismatch by accounting for the feature uncertainty during decoding. We have proposed dynamic variance adaptation to estimate the feature uncertainty given adaptation data by maximization of likelihood or discriminative criterion such as MMI. For unsupervised adaptation, the transcriptions are obtained from a first recognition pass and thus contain errors. Such errors are fatal when using a discriminative criterion. In this paper, we investigate the recently proposed differenced MMI discriminative criterion for unsupervised dynamic variance adaptation, because it inherently includes a mechanism to mitigate the influence of errors in the transcriptions.
منابع مشابه
Discriminative Linear Transforms for Speaker Adaptation
Linear transform adaptation techniques such as Maximum Likelihood Linear Regression (MLLR) are a popular and effective family of methods for speaker adaptation. MLLR estimates transform parameters for Gaussian means and variances using a maximum likelihood (ML) objective function. This paper discusses the use of an alternative discriminative objective function for linear transform estimation, w...
متن کاملImprovements in linear transform based speaker adaptation
This paper presents three forms of linear transform based speaker adaptation that can give better performance than standard maximum likelihood linear regression (MLLR) adaptation. For unsupervised adaptation, a lattice-based technique is introduced which is compared to MLLR using confidence scores. For supervised adaptation, estimation of the adaptation matrices using the maximum mutual informa...
متن کاملSemantic Text Clusters and Word Classes – the Dualism of Mutual Information and Maximum Likelihood
Dynamically modeling the word distribution in a variety of texts is a goal with various applications. For speech recognition a dynamic unigram may efficiently be used for the adaptation of longer ranging language models. For information retrieval it may be a good starting point to predict the most characteristic words in document dependent queries. This short paper presents two approaches for a...
متن کاملQuasi Maximum-Likelihood Estimation of Dynamic Panel Data Models
This paper establishes the almost sure convergence and asymptotic normality of levels and differenced quasi maximum-likelihood (QML) estimators of dynamic panel data models. The QML estimators are robust with respect to initial conditions, conditional and time-series heteroskedasticity, and misspecification of the log-likelihood. The paper also provides an ECME algorithm for calculating levels ...
متن کاملThe Cu-htk March 2000 Hub5e Transcription System
This paper describes the Cambridge University HTK (CU-HTK) system developed for the NIST March 2000 evaluation of English conversational telephone speech transcription (Hub5E). A range of new features have been added to the HTK system used in the 1998 Hub5 evaluation, and the changes taken together have resulted in an 11% relative decrease in word error rate on the 1998 evaluation test set. Maj...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012